A Clustering Method Based on the Maximum Entropy Principle

نویسندگان

  • Edwin Aldana-Bobadilla
  • Ángel Fernando Kuri Morales
چکیده

Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are “similar” to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method’s effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method. Entropy 2015, 17 152

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximum within-cluster association

This paper addresses a new method and aspect of information-theoretic clustering where we exploits the minimum entropy principle and the quadratic distance measure between probability densities. We present a new minimum entropy objective function which leads to the maximization of within-cluster association. A simple implementation using the gradient ascent method is given. In addition, we show...

متن کامل

Max-Entropy Feed-Forward Clustering Neural Network

The outputs of non-linear feed-forward neural network are positive, which could be treated as probability when they are normalized to one. If we take Entropy-Based Principle into consideration, the outputs for each sample could be represented as the distribution of this sample for different clusters. Entropy-Based Principle is the principle with which we could estimate the unknown distribution ...

متن کامل

A Non-parametric Maximum Entropy Clustering

Clustering is a fundamental tool for exploratory data analysis. Information theoretic clustering is based on the optimization of information theoretic quantities such as entropy and mutual information. Recently, since these quantities can be estimated in non-parametric manner, non-parametric information theoretic clustering gains much attention. Assuming the dataset is sampled from a certain cl...

متن کامل

A Note on the Bivariate Maximum Entropy Modeling

Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1  and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2015